Code Generators for Automatic Tuningof Numerical Kernels : Experiences with FFTWPosition
نویسندگان
چکیده
Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One such system is FFTW (Fastest Fourier Transform in the West) for the discrete Fourier transform. In this paper, we review FFTW's inner workings with an emphasis on its code generator, and report on our empirical evaluation of the system on two diierent hardware and compiler platforms. We then describe a number of our own extensions to the FFTW code generator that compute eecient discrete cosine transforms and show promising speed-ups over a vendor-tuned library. We also comment on current opportunities to develop tuning systems in the spirit of FFTW for other widely-used kernels.
منابع مشابه
Generators for Automatic Tuningof Numerical Kernels : Experiences with FFTWPosition
Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One ...
متن کاملAutomatic Generation and Adaptation of Numerical Kernels
Designing software that achieves peak performance on modern architectures is a difficult, expensive and often highly platform specific task. In this paper we discuss recent automatic adaptive optimization approaches to high-performance programming: ATLAS, FFTW, and SPIRAL. They are designed to eliminate hand-coding and hand-tuning for various numerical kernels. Further, we describe our own work...
متن کاملAutomatic Generation of Sparse Tensor Kernels with Workspaces
Recent advances in compiler theory describe how to compile sparse tensor algebra. Prior work, however, does not describe how to generate efficient code that takes advantage of temporary workspaces. These are often used to hand-optimize important kernels such as sparse matrix multiplication and the matricized tensor times Khatri-Rao product. Without this capability, compilers and code generators...
متن کاملAutomatic Coarse-grain Partitioning and Automatic Code Generation for Heterogeneous Architectures
Real-time signal, image, and control applications have very important time constraints, involving the use of several powerful numerical calculation units. The aim of our work is to develop a fast and automatic prototyping process dedicated to parallel architectures made of both PC and several last generation Texas Instruments digital signal processors: TMS320C6X DSP. The process is based on Syn...
متن کاملAdjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance
We demonstrate the flexibility and ease of use of C++ algorithmic differentiation (AD) tools based on overloading to numerical patterns (kernels) arising in computational finance. While adjoint methods and AD have been known in the finance literature for some time, there are few tools capable of handling and integrating with the C++ codes found in production. Adjoint methods are also known to b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006